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(54) Encoding, storing and transmitting digital signals 



(57) A method for encoding a digital signal which is 
free from overflow and/or underflow of decoder buffer 
when video and audio data are reproduced while switch- 
ing a plurality of bit streams from one to another, is dis- 
closed. The method includes the steps of receiving a 
plurality of digital signal bit streams, detecting sizes of 
access units, as an encoding unit, of the plural digital 
signal bit streams, and decode times thereof, comparing 
the detected sizes of the access units at each decode 
time, with each other to select a maximum value thereof, 
providing a virtual stream composed of access units 
each having a size identical to the selected maximum 



value at each decode time, and packetizing each of the 
plural digital signal bit streams, wherein the plural digital 
signal bit streams are packetized by using padding 
packets each having a size corresponding to a differ- 
ence in size between each access unit of the plural dig- 
ital signal bit streams and that of the virtual stream, when 
the size of the access unit of the plural digital signal bit 
streams is less than the size of the access unit of the 
virtual stream. In accordance with the present invention, 
there are also provided an apparatus for encoding the 
digital signal, a method for recording the digital signal 
on a recording medium, and a method for transmitting 
the digital signal. 
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Description 

This invention relates to encoding, storing and 
transmitting digital signals. 

In recent years, in many cases, when picture signal 
or speech signal stored on a recording medium such as 
magnetooptical disk or magnetic tape is reproduced 
therefrom to display a video image with sound, or trans- 
mitted through a given transmission line from a trans- 
mitting side to a receiving side where video image or 
sound is reproduced, as used in video conference sys- 
tem, video telephone system or the like, these signals 
have been encoded according to a so-called MPEG sys- 
tem after subjected to A/D conversion. 

Here, the afore-mentioned "MPEG" represents an 
abbreviation of Moving Picture Experts Group which is 
an organization for investigating the encoding of dynam- 
ic image to be stored, belongingto ISO/IEC, JTC1/SC29 
(International Organization for standardization/Interna- 
tional Electrotechnical Commission, Joint Technical 
Committee 1/Sub-Committee 29). 1S011172 is pre- 
scribed as MPEG1 standard while IS013818 is pre- 
scribed as MPEG2 standard. In these international 
standards, the term "multiplexing of multi-media" is nor- 
malized in IS011172-1 and IS013818-1 while the term 
"picture image" is normalized in ISOT1 172-3 and 
IS013818-3. 

Since the picture signal and the speech signal are 
usually handled at the same time, it is general that a 
plurality of data including the picture signal, the speech 
signal and related information data are multiplexed so 
as to be recorded and transmitted together. When these 
signals are reproduced, the multiplexed data is separat- 
ed or demultiplexed into individual kinds of data and 
then decoded to reproduce these data in a synchronous 
manner. 

In the case where these data are multiplexed, the 
given number of picture signals and speech signals are 
individually encoded to produce encoded streams, and 
thereafter these encoded streams are multiplexed. 

The multiplexed stream is prescribed in MPEG sys- 
tem (ISO/1EC1 381 8-1 or ISO/IEC1 1 1 72-1 ). In the follow- 
ing, the structure of the decoder model and the multi- 
plexed stream prescribed in the MPEG system are ex- 
plained. For simplicity, the explanation is made in the 
context of MPEG2 (ISO/I EC1 381 81) program stream 
and MPEG1 system (1SO/1EC11 172-1) stream. Howev- 
er, it will be appreciated that the principle for decoding 
the MPEG 2 program stream is equally applicable to de- 
coding of MPEG2 system transport stream (ISO/ 
IEC11172-1). 

in the MPEG system, a virtual decoder model (STD: 
system target decoder) is prescribed. The multiplexed 
stream is defined therein such that the stream can be 
correctly decoded by the system target decoder (STD), 
i.e., it can be decoded without causing inappropriate op- 
erative condition of buffer such as overflow or underflow 
of data. 



Next, the operation of the system target decoder 
(STD) is described. Fig. 1 illustrates a schematic ar- 
rangement of an example of the system target decoder 
STD. Figs 2A and 2B illustrate the structures of the 
5 MPEG2 program stream and MPEG2 transport stream, 
respectively. 

The system target decoder STD 16 include therein 
a reference clock called a system time clock (STC) 1 6, 
which is put forward in a predetermined increment. On 

10 the other hand, MPEG2 system program stream is com- 
posed of a plurality of access units. The stream has a 
time information called a system clock reference (SCR) 
which is encoded in a region called a pack header, as 
shown in Figs 2A and 2B. When the time of STC 16 is 

is equal to the SCR, the decoder read out a corresponding 
pack, i.e., a unit of the program stream at a predeter- 
mined rate, i.e., a rate which is encoded in "mux_rate 
field" of the pack header. 

The read-out pack is immediately separated or de- 

20 multiplexed into respective elementary streams such as 
video stream and audio stream by means of a demulti- 
plexer 11 depending upon a sort of each packet which 
is a unit of the pack. The respective demultiplexed ele- 
mentary streams are input to corresponding decoder 

25 buffers, i.e., a video buffer 12 and an audio buffer 14. 

The packet header has fields for time information 
which is called a decoding time stamp (DTS) or a pres- 
entation time stamp (PTS), as shown in Figs 2A and 2B. 
The time information represents decode time and pres- 

30 entation time of a decoding unit (access unit) of each 
elementary stream. Specifically, the PTS represents a 
time at which the access unit is displayed, and the DTS 
represents a time at which the access unit is decoded. 
However, in the case of the access unit whose DTS and 

35 PTS are equal to each other, only the data of the PTS 
is encoded. When the value of the STC is equal to the 
value of the DTS, the access unit input into the video 
buffer 12 or the audio buffer 14 are read out therefrom 
and supplied to respective decoders, i.e., a video de- 

40 coder 1 3 or an audio decoder 1 5 so as to be decoded. 
Thus, in the system target decoder STD, since the 
decode time information relative to the common refer- 
ence clock (STC) 16 is encoded in the packet of each 
elementary stream, video data, audio data or other data 

45 can be reproduced in a synchronous manner. 

In addition, upon multiplexing, it is required that the 
system clock reference SCR which defines a supply 
time of the pack to the system target decoder STD 
should be determined so as not to cause overflow or 

50 underflow of data in the buffers for the respective ele- 
mentary streams in the system target decoder STD, and 
that the access units are packetized. Incidentally, the 
overflow means that the data supplied to the decoder 
exceeds a capacity of the buffer, while the underflow 

55 means that the access unit to be decoded is not com- 
pletely supplied to the buffer at the decode time thereof. 

In the foregoing, the MPEG2 program stream 
shown in Fig. 2A is explained, incidentally, the MPEG2 
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transport stream as shown in Fig. 2B has the same 
structure as that of the MPEG2 program stream. A trans- 
port stream header as shown in Fig. 2B is constituted 
by four bytes from synchronous byte (syc_byte) to con- 
tinuity counter, which is prescribed in the afore-men- s 
tioned ISO/IEC1 3818-1 . The clock reference and the 
decode time has the same meanings as those of the 
MPEG2 program stream shown in Fig. 2A. 

The MPEG video data have a structural unit called 
Group of Pictures (GOP). The structural unit can be en- 10 
coded independently, i.e., the encoding of the GOP can 
be done such that when the GPO is decoded, any pic- 
ture involved in the preceding GOP is not required. Ac- 
cordingly, if a plurality of video streams are present, they 
can be switched by GOP or GOPs as a unit for the is 
switching. 

In the following, there is considered the case where 
two different kinds of program streams, which have 
been encoded under the afore-mentioned conditions, i. 
e., under such conditions that the video stream is en- 20 
coded every GOP, are independently multiplexed. At 
this time, however, there is such a limitation that the 
boundary of each GOP is not present within the video 
packet, namely video data of pictures immediately be- 
fore and after the boundary of GOP does not exist within 2s 
one video packet. 

Figs. 3A through 3C illustrate an example of the 
case where two program streams are independently 
multiplexed under the afore-mentioned conditions, and 
an example of the case where the two program streams 30 
are selectively switched from one to another and out- 
putted. As shown in Fig. 3A, the data of GOP0 in video 
stream V0 are multiplexed over packs PKO and PK1 of 
a program stream PSO, and the data of GOP1 of the 
video stream V0 are multiplexed over packs PK2 and 35 
PK3 of the program stream PSO. On the other hand, as 
shown in Fig. 3B, the data of GOP0 in video stream VI 
are multiplexed over packs PKO and PK1 and PK2 of 
program stream PS1 , and the data of GOP1 of the video 
stream V1 are multiplexed over a pack PK3 of the pro- 40 
gram stream PS1 . 

The two program streams independently multi- 
plexed as shown in Figs. 3A and 3B, are stored on a 
common recording medium. If such a system that the 
thus-stored two program streams are outputted every 45 
pack or packs while selectively switching therebetween, 
for example, by using the read-out device 10 shown in 
Fig. 1, is now considered, the afore-mentioned inde- 
pendent GOP (Group of Pictures) arrangement enables 
the video data to be continuously reproduced in a seam- so 
less manner when the program streams outputted are 
switched at switching points. 

For example, as shown in Fig. 3C, when the packs 
PKO and PK1 of the program stream PSO are read out 
and thereafter the pack 3 of the program stream PS1 is 55 
continuously read out, the GOPO of the video stream V0 
and then the GOP1 of the video stream V1 are inputted 
into the video buffer 12 shown in Fig. 1, so that the video 



image can be continuously reproduced even if it is 
switched between the video streams V0 and V1. In this 
example, although there is described the case where 
the two different program streams are stored in the re- 
cording medium, it will be appreciated that the same ef- 
fects can be obtained when three or more program 
streams are used. Hereinafter, packs corresponding to 
these switching points between the GOPs are referred 
to as entry points. 

Meanwhile, in the case where a plurality of program 
streams are stored in a recording medium and a read- 
out device has a function for reading out the program 
streams while being switched from one to another at en- 
try points, there occasionally arises an inconvenience 
that these program streams cannot be correctly decod- 
ed by a decoder if such plural program streams to be 
stored on the recording medium are independently mul- 
tiplexed as in conventional methods. This is caused by 
the following two reasons. 

Reason I: Inconsistency of system clock reference 
(SCR): 

The system clock reference (SCR) encoded in the 
pack header represents a read-out start time of the pack 
data inputted to the decoder. For this reason, the system 
clock references (SCR) of the adjacent two packs to be 
read-out and input to the decoder are needed to satisfy 
the following condition: 

(SCR encoded in the latter pack)>[(SCR encoded 
in the former pack) + (transfer time of the former 
pack)], namely, 

(SCR encoded in the latter pack)>[(SCR encoded 
in the former pack) + (data length of the former 
pack)/(read-out rate)] 

Accordingly, even though the afore-mentioned con- 
dition can be satisfied when the program stream PSO is 
read-out in the order of PKO, PK1, PK2, PK3 ... (namely, 
even though the individual program streams are multi- 
plexed so as to satisfy the afore-mentioned condition), 
if the program streams are switched from one to another 
by the data encoded in the entry points, for example, 
such that the packs PKO and PK1 of the program stream 
PSO are first read out, and then the pack PK3 of the pro- 
gram stream PS1 is read out, as shown in Fig. 3C : there 
occasionally arises such a problem that the afore-men- 
tioned condition is no longer satisfied because the pro- 
gram streams PSO and PS1 are multiplexed separately 
from each other. That is, when the program streams are 
read-out in the afore-mentoioned order, the system time 
clock (STC) upon the termination of reading-out of the 
former pack becomes larger than the value of the sys- 
tem clock reference encoded in the latter pack, so that 
it is impossible to read out the data of the latter pack. 
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Reason II: Inappropriate operative condition of buffer 
(Overflow and/or underflow ot data in buffer): 

When the program streams to be read-out are 
switched from one to another by the read-out device, 
the inappropriate operative condition of the decoder 
buffer such as overflow or under flow is likely to occur. 

The afore-mentioned Reason II is explained in de- 
tail below by referring to Figs. 4A through 4C. Figs. 4A 
through 4C illustrate the transition in memory amount of 
the video decoder buffer occupied by the data. Fig. 4A 
shows the condition of the decoder buffer, where, for ex- 
ample as shown in Fig. 3A, the program stream PSO is 
regularly read out in the order of the packs PK0 i PK1 , 
PK2, PK3 and so on. In Fig. 4A, the region (a) represents 
the amount of data in the buffer which is occupied by 
the GOP0 of the video stream VO and the region (b) rep- 
resents the amount of data in the buffer which is occu- 
pied by the GOP1 of the video stream VO. Fig. 4B shows 
the condition of the decoder buffer, where, for example 
as shown in Fig. 3B, the program stream PS1 is regu- 
larly read out in the order of the packs PKO, PK1 , PK2, 
PK3 and so on. In Fig. 4B, the region (c) represents the 
amount of data in the buffer which is occupied by the 
GOP0 of the video stream V1 and the region (d) repre- 
sents the amount of data in the buffer which is occupied 
by the GOP1 of the video stream V1. Each of the pro- 
gram streams shown in Figs. 4Aand 4B is continuously 
formed. Therefore, the program streams are multi- 
plexed such that the decoder buffer does not cause in- 
appropriate operative condition thereof such as over- 
flow or underflow. However, if the multiplexed program 
streams are read out while being switched from one to 
another by the read-out device such that the packs PKO 
and PK1 of the program stream PSO are first read out 
in this order and then the pack PK3 of the program 
stream PS1 is readout, as shown in Fig. 3C, the decoder 
buffer is supplied first with the data of the GOP0 of the 
video stream VO and then with the data of the GOP1 of 
the video stream V1 . As a result, the amount of data 
occupied by these GOP's in the decoder buffer is in the 
condition as shown in Fig. 4C. In Fig. 4C, the region (e) 
represents the amount of data occupied by the GOP0 
of the video stream VO and the region (f) represents the 
amount of data occupied by the GOP1 of the video 
stream V1 . 

When the data of the GOP1 of the video stream VI 
is decoded, the read-out thereof is determined by the 
system clock reference (SCR) while the pulling-out of 
the data from the decoder buffer is determined by the 
decoding time stamp (DTS), so that timings of reading- 
out and pulling-out of the data from the decoder buffer 
are similar to (f ) shown in Fig. 4C, thereby causing the 
over flow of the decoder buffer. 

Various respective aspects of the invention as de- 
fined in the appended claims. 

Accordingly, embodiments of the present invention 
can provide a method and an apparatus for encoding a 



digital signal, a recording medium for storing the digital 
signal and a method and an apparatus for transmitting 
the digital signal, which are prevented from causing mis- 
alignment of system clock references when a plurality 
5 of program streams are read out while being switched 
from one to another at entry points thereof, and are ca- 
pable of reading out the program streams without an in- 
appropriate operative condition of a decoder buffer such 
as overflow or underflow of data in the buffer. 
10 In an aspect of the present invention, there is pro- 
vided a method for encoding a digital signal, which 
method includes the steps of receiving a plurality of dig- 
ital signal bit streams, detecting sizes of access units, 
as an encoding unit, of the plural digital signal bit 
is streams, and decode times thereof, comparing the de- 
tected sizes of the access units at each decode time, 
with each other to select a maximum value thereof, pro- 
viding a virtual stream composed of access units each 
having a size identical to the selected maximum value 
at each decode time, and packetizing each of the plural 
digital signal bit streams, wherein the plural digital signal 
bit streams are packetized by using padding packets 
each having a size corresponding to a difference in size 
between each access unit of the plural digital signal bit 
streams and that of the virtual stream, when the size of 
the access unit of the plural digital signal bit streams is 
less than the size of the access unit of the virtual stream. 

In another aspect of the present invention, there is 
provided an apparatus for encoding a digital signal, 
which includes a receiving terminal for receiving a plu- 
rality of digital signal bit streams, an access unit detect- 
ing device for detecting sizes of access units, as an en- 
coding unit, of the plural digital signal bit streams, and 
decode times thereof, a maximum value detecting de- 
vice for comparing the detected sizes of the access units 
at each decode time, with each other to select a maxi- 
mum value thereof, a scheduler for providing a virtual 
stream composed of access units each having a size 
identical to the selected maximum value at each decode 
time; and a packetizing device for packetizing each of 
the plural digital signal bit streams, wherein the plural 
digital signal bit streams are packetized by using pad- 
ding packets each having a size corresponding to a dif- 
ference in size between each access unit of the plural 
digital signal bit streams and that of the virtual stream, 
when the size of the access unit of the plural digital sig- 
nal bit streams is less than the size of the access unit of 
said virtual stream. 

In another aspect of the present invention, there is 
provided a method for transmitting a digital signal, which 
method includes the steps of receiving a plurality of dig- 
ital signal bit streams, detecting sizes of access units, 
as an encoding unit, of the plural digital signal bit 
streams, and decode times thereof, comparing the de- 
tected sizes of the access units at each decode time, 
with each other to select a maximum value thereof, pro- 
viding a virtual stream composed of access units each 
having a size identical to the selected maximum value 
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at each decode time, and packetizing each of the plural 
digital signal bit streams, wherein the plural digital signal 
bit streams are packetized by using padding packets 
each having a size corresponding to a difference in size 
between each access unit of the plural digital signal bit 
streams and that of the virtual stream, when the size of 
the access unit of the plural digital signal bit streams is 
less than the size of the access unit of said virtual 
stream, thereby transmitting one stream produced by 
packetizing the plural digital signal bit streams. 

In another aspect ol the present invention, there is 
provided a method for storing a recording signal on a 
recording medium, which includes the steps of receiving 
a plurality of digital signal bit streams, detecting sizes of 
access units, as an encoding unit, of the plural digital 
signal bit streams, and decode times thereof, comparing 
the detected sizes of the access units at each decode 
time, with each other to select a maximum value thereof, 
providing a virtual stream composed of access units 
each having a size identical to the selected maximum 
value at each decode time, and packetizing each of the 
plural digital signal bit streams, wherein the plural digital 
signal bit streams are packetized by using padding 
packets each having a size corresponding to a differ- 
ence in size between each access unit of the plural dig- 
ital signal bit streams and that of said virtual stream, 
when the size of the access unit of the plural digital sig- 
nal bit streams is less than the size of the access unit of 
said virtual stream. 

The invention will now be described by way of ex- 
ample with reference to the accompanying drawings, 
throughout which like parts are referred to by like refer- 
ences, and in which: 

Fig. 1 is a block diagram schematically showing an 
arrangement of a system target decoder (STD) pre- 
scribed in a so-called MPEG standard; 
Figs. 2A and 2B are views showing examples of ar- 
rangements of a program stream and a transport 
stream prescribed in a so-called MPEG standard, 
respectively; 

Figs. 3A through 3C are views showing examples 
of streams obtained by independently multiplexing 
different two program streams, and a stream ob- 
tained by switching these two program streams 
from one to another, respectively; 
Figs. 4A through 4C are graphs explaining the atti- 
tude of buffer observed when program streams are 
switched from one to another; 
Fig. 5 is a block diagram schematically showing an 
arrangement of a digital signal encoding apparatus 
according to a preferred embodiment of the present 
invention; 

Fig. 6 is a graph explaining a manner for determin- 
ing access units of a virtual video; 
Figs. 7A through 7E are views showing an example 
of a program stream produced according to the pre- 
ferred embodiment of the present invention; 



Fig. 8 is a view explaining an attitude of the buffer 
observed when the program stream produced ac- 
cording to the preferred embodiment of the present 
invention is decoded; and 
s Figs. 9A through 91 are views showing a program 

stream obtained when the preferred embodiment of 
the present invention is applied to video data pro- 
vided at different two camera angles. 

The preferred embodiments of the present inven- 
tion are described below by referring to the accompa- 
nying drawings. 

Fig. 5 illustrates a block diagram schematically 
showing a digital signal encoding device according to a 
preferred embodiment of the present invention. In this 
embodiment, it is assumed that a plurality of videos pro- 
vided from the same scene but at different camera an- 
gles, for example, three different video streams VS 0 to 
VS 2 , are encoded. 

Since these video streams are provided from the 
same scene, the other data to be multiplexed over the 
respective three video streams VS 0 to VS 2 , which in- 
clude, for example, audio data or superimposed data 
(audio stream AS or additional stream such TS), may 
be identical to each other. For this reason, one kind of 
stream for each of the other data can be used to be mul- 
tiplexed relative to each of the three video streams VS 0 
to VS 2 . When the finally-produced program streams PS 0 
to PS 2 are reproduced on a decoder side while being 
switched from one to another every GOP (Group of Pic- 
tures) or GOP's, in order to assure a continuous repro- 
duction at each switching point, it is required that the 
time and the field parity at the switching point (i.e., 
whether it starts from a top field or a bottom field) be- 
tween the respective video streams are identical to each 
other. To fulfill the afore-mentioned requirement, picture 
types, top field first flag and repeat first field flag of the 
videos provided at different camera angles can be cod- 
ed identically. Actually, these are not necessarily coded 
similarly. 

Incidentally, the top field first flag and the repeat first 
field flag are defined as follows in MPEG2. The top field 
first flag means a flag which indicates which field should 
be first output, the top field or the bottom field, when an 
interlace is displayed on a monitor. On the other hand, 
the repeat first field flag means a flag which indicates 
the field excluded when signals produced by 3:2 pull- 
down method are encoded, for example, in the case 
where the film source (24 frames) such as movie films 
is converted into interlace video signals (30 frames). 

In this embodiment according to the present inven- 
tion, the video streams provided at different camera an- 
gles can be multiplexed such that the video buffer is 
maintained in the same condition after the access unit 
is pulled out of the buffer and then decoded, even 
though any of the video streams is intended to be de- 
coded. This makes it possible to maintain the same con- 
dition of the buffer even when the program streams are 
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switched from one to another every GOP or GOP's. As 
a result, the video streams can be reproduced in a 
seamless manner without inappropriate operative con- 
dition of the buffer. 

In the following, the system used in the embodiment 5 
according to the present invention is described. 

In Fig. 5, there are shown access unit decoders 50, 
51 and 52a to 52c which can detect a size of each ele- 
mentary stream and a decode time (DTS) of the access 
unit (if the decode time is different from the presentation 10 
time (PTS), the presentation time is also detected). In 
the case where the encoder for elementary stream and 
the multiplexer are formed as an integral system, it is 
highly possible to output the aforementioned informa- 
tion from the encoder. In such a case, output values of 15 
the encoder can be used for the detection. 

In Fig. 5, an audio stream AS is supplied to the ac- 
cess unit detector 50; three video streams VS 0 , VS 1( 
and VS 2 provided at different camera angles are sup- 
plied to the access unit detectors 52a, 52b and 52c, re- 20 
spectively; and other stream TS is supplied to the ac- 
cess unit detector 51 . The respective access unit detec- 
tors 50, 51 and 52a to 52c can detect the size of access 
unit of each elementary stream and the decode time 
DTS and, if necessary, the presentation time PTS. 2s 

The access unit detectors 52a, 52b and 52c supply 
the data indicating the size of each access unit of the 
three video streams VS 0 , VS., and VS 2 to a maximum 
value detector 53 where the sizes of the access units 
are compared with each other to select the maximum 30 
size of the access units every decode time. 

Fig. 6 illustrates the manner for determining the 
maximum size of the access units of the three video 
streams VS 0 , VS 1 and VS 2 . For better understanding, 
in Fig. 6, grouped access units of the respective video 35 
streams VS 0 , VS 1 and VS 2 at each decode time are 
shown as being displaced along the direction of abscis- 
sa (time axis). However, as will be apparently appreci- 
ated, each group shows the sizes of the access units 
which are present at the same time, i.e. , at each decode 40 
time t n (n=1 , 2, ...). At each decode time a virtual vid- 
eo stream PVS whose access unit has a size identical 
to the maximum size of the access units of the video 
streams VS 0> VS-, and VS 2 is selected. It is considered 
that the virtual video stream PVS has such decode in- 45 
tervals and the size of access unit. 

A scheduler 54 receives, from the access unit de- 
tectors 50 and 51, the data information concerning a 
size of access unit and decode time for the other ele- 
mentary streams to be multiplexed over the virtual video 50 
stream PVS which include audio stream AS and the oth- 
er stream TS such as superimposition data, and outputs 
a control information such as kind of elementary stream 
to be packetized, a size of the packet, a system clock 
reference (SCR) added to the packet or the like. The ss 
scheduler 54 may be the same as those used for ordi- 
nary multiplexing systems. For instance, there can be 
used the scheduling technique disclosed in the specifi- 



cation and drawings of Japanese Patent Application 
Laid-open No. Hei-7-341951. Further, other scheduling 
techniques may be used. 

The control information output from the scheduler 
54 is supplied to packetizing units 55a, 55b and 55c to 
packetize the elementary streams. In this case, since 
the scheduler 54 schedules the virtual video stream (vir- 
tual video) PVS, even if the control information produced 
from the scheduler is used to packetize actual video 
streams (actual video), the occupied amount of the buff- 
er upon withdrawal of the access unit therefrom is not 
necessarily the same. However, the following relation- 
ship is established: 

[Size of access unit of actual video]<[Size of ac- 
cess unit of virtual video] 

Accordingly, in the case where the size of access 
unit of the actual video is less than the size of access 
unit of the virtual video upon packetizing, when a pad- 
ding packet is packetized, the condition of the buffer af- 
ter the withdrawal (decoded) of the data therefrom can 
be kept constant irrespective of the difference in camera 
angle between the video streams. 

The operation of these packetizing units 55a 
through 55c is explained by referring to Figs. 7A to 7E. 

Fig. 7 A illustrates a size of an access unit PVAU of 
the virtual video and Fig. 7B illustrates a size of an ac- 
cess unit VAU of the video to be actually multiplexed. In 
Fig. 7C, the solid lines in the virtual video show packets 
divided by the multiplexing scheduler. When the actual 
video is packetized into respective video packets VP by 
using the information output from the scheduler, the por- 
tions as indicated with hatching in Fig. 7C are required 
to be compensated, because the size of the access unit 
PVAU of the virtual video stream is different to a large 
extent from that of the access unit VAU of the actual vid- 
eo stream. The packetizing units 55a to 55c have a func- 
tion to generate padding packets PP shown in Fig. 7D 
at the times corresponding to the hatching portions 
shown in Fig. 7C. When the video stream is packetized 
as shown in Fig. 7D, the program stream as shown in 
Fig. 7E, over which the other elementary streams are 
multiplexed, can be produced. In this embodiment, one 
stream of the video data, one stream of the audio data 
and one stream of the superimposition data are multi- 
plexed. 

In addition, in the case where the difference in size 
of access unit between the virtual and actual video 
streams is compensated as mentioned above and the 
padding packet PP itself constitutes a pack, even if the 
padding packet PP is not transmitted, the transition of 
the respective elementary streams in the buffer is not 
influenced at all thereby. In order to reduce the occur- 
rence of overhead upon multiplexing, i.e., in order to pre- 
vent undesired accumulation of the data, the packetiz- 
ing units 55a to 55c have a function for withholding the 
padding packets PP from being packetized. 

That is, in this embodiment, since the padding pack- 
et PP itself indicated by °x° in Fig. 7D, constitutes a pack, 
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the padding packet PP is not packetized in the program 
stream shown in Fig. 7E. However, when it is intended 
to simplify the hardware of the packetizing units 55, all 
the padding packets PP can be packetized, though the 
overhead associated with the multiplexing is increased. 

When the data are multiplexed by the multiplexer 
according to the present embodiment, the occupied 
amount of the buffer is varied, for example, as shown in 
Fig. 8. In Fig. 8, the dotted lines indicate the transition 
of the occupied amount of the video buffer, which is ob- 
served when the virtual video is subjected to scheduling, 
while the solid lines indicate the occupied amount of the 
video buffer, which is observed when the actual video is 
multiplexed by using the information obtained by sub- 
jecting the virtual video stream to scheduling. Inciden- 
tally, the times t1, t2, t3, ... represent decode times. 

Until the point indicated by the mark "X" in Fig. 8 is 
reached, the video buffer is loaded at the same timing 
as that of the virtual video. Further, in the region between 
the marks "x" and "O", the difference in size of access 
unit between the virtual and actual videos (actual video 
stream) is replaced with the padding packet. In this 
case, no load is applied to the video buffer so that the 
data amount in the video buffer is not changed until the 
loading initiation point of the next access unit indicated 
by the mark "O" or the next decode time t n (n=1 , 2, ...) 
is reached. 

As a result, the following relationship can be estab- 
lished at every decode time: 

[Occupied amount of buffer by actual video]^[Oc- 
cupied amount of buffer by virtual video] 

Accordingly, when the video data provided at differ- 
ent camera angles are encoded and multiplexed by the 
method according to the present invention to produce a 
plurality of program streams : all the thus-produced pro- 
gram streams can satisfy the afore-mentioned relation- 
ship at every decode time. Therefore, if the virtual video 
is multiplexed so as not to cause any inappropriate op- 
erative condition of the decoder buffer, these program 
streams can also be decoded while being switched from 
one to another at entry points thereof without causing 
any inappropriate operative condition of the video buffer. 

These program streams PS 0 , PS-, and PS 2 can be 
stored in a recording medium 56 such as disk. 

Next, Figs. 9A through 91 show the case where vid- 
eo data provided at different two camera angles and en- 
coded are multiplexed. Fig. 9A illustrates a size of each 
access unit of the video VO and Fig. 9B illustrates a size 
of each access unit of the video V1. In this embodiment, 
GOP0 and GOP1 each ar-e constituted by four access 
units. The "GOP means "Group of Pictures" prescribed 
in MPEG. Fig. 9C illustrates a size of each access unit 
of the virtual video obtained by comparing sizes of ac- 
cess units of the videos V1 and V2 and selecting the 
larger one. Fig. 9D illustrates the packetized condition 
of the virtual video resulting from scheduling of the vir- 
tual video. When the actual video is multiplexed, if any 
difference between the virtual and actual videos is 



caused : padding is performed as shown in Figs. 9E and 
9F. In this case, if a packet of the virtual video is pack- 
etized into three or more packets, padding packets 
thereof may be placed together on a rear side to merge 

s the three or more packets into two packets. Assuming 
that the virtual video is multiplexed resulting from sched- 
uling to produce a program stream as shown in Fig. 9G, 
the actual video is multiplexed according to the present 
embodiment to produce a program stream as shown in 

10 Figs. 9H and 91. 

Video image and sound can be reproduced in a 
seamless manner without inappropriate operative con- 
dition of the buffer even in the case where the thus-pro- 
duced program streams are decoded while being op- 

1£ tionally switched from one to another as indicated by ar- 
rows in Figs. 9Hand9l. Incidentally, in this embodiment, 
there is considered the case where two videos are mul- 
tiplexed. However, as will be apparently appreciated, 
the present invention can be applied to the case where 

20 three or more videos are multiplexed. 

In the afore-mentioned embodiments, the method 
according to the present invention is applied to program 
stream. However, the method according to the present 
invention can also be applied to transport stream used 

25 for the purpose of transmission. In the case of the trans- 
port stream, one stream is constituted by a plurality of 
channels and each channel corresponds to an inde- 
pendent stream. Further, each channel has independ- 
ently a time base. Accordingly, by applying the afore- 

30 mentioned method used for a plurality of program 
streams to the respective channels, the plural channels 
serving as streams can be multiplexed over one trans- 
port stream to transmit the thus-multiplexed transport 
stream, so that it is possible to reproduce video images 

35 provided at different camera angles in a seamless man- 
ner when switching the plural channels from one to an- 
other. Further, in the afore-mentioned embodiments, 
there is considered the case where video images sup- 
plied from different camera angles are multiplexed. 

40 However, as will be apparently understood, the present 
invention can be applied to the images which have no 
relation to each other. Further, the present invention can 
also be applied to audio data or other data as well as 
the videos. 

45 Sizes of access units as an encoding unit of a plu- 
rality of bit streams composed of digital signals, and de- 
code times are detected, and the thus-detected sizes of 
the access units are compared with each other at each 
decode time to select the maximum values thereof. 

50 Next, a virtual stream is constructed so as to have ac- 
cess units whose sizes are each equal to the above- 
selected maximum value at each decode time. In the 
case where the respective bit streams of the digital sig- 
nals are packetized, if the size of each access unit is 

55 less than that of the afore-mentioned virtual stream, the 
bit streams are packetized by using padding packets 
each having a size identical to a difference in size be- 
tween the access units. Accordingly, the plural bit 
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streams can be reproduced in a continuous manner 
without any inappropriate operative condition of the de- 
coder buffer even when they are decoded while being 
switched from one to another. 

That is, when the multiplexing of data is performed 
for a system having a function capable of reproducing 
a plurality of multiplexed streams while being switched 
from one to another, sizes of access units of each 
stream and decode times are determined from each el- 
ementary unit to select the maximum value of the ac- 
cess units at every decode time to establish a virtual vid- 
eo stream which serves for scheduling of the multiplex- 
ing. In the case where the actual video is multiplexed, 
the difference between the virtual and actual videos are 
compensated by padding, so that the plural multiplexed 
streams are reproduced while being switched from one 
to another without any inappropriate operative condition 
of the buffer and without interruption of video image and 
sound reproduced. These plural multiplexed streams 
may be stored in respective portions of a recording me- 
dium. Alternatively, the plural multiplexed streams can 
be merged into one transport stream for the purpose of 
broadcasting. 

As will be apparently appreciated, various changes 
or modifications can be made without departing from the 
scope of the present invention. Accordingly, the present 
invention is not intended to be limited to only the afore- 
mentioned embodiments. 

In summary, at least embodiments of the invention 
relate to a method and apparatus for encoding a digital 
signal, a recording medium used for storing the digital 
signal and a method and apparatus for transmitting the 
digital signal, which are suitable for recording a dynamic 
image signal or a sound signal on a recording medium 
such as a magnetooptical disk or magnetic tape and re- 
producing these signals from the recording medium so 
as to display a video image on a monitor, or for trans- 
mitting the dynamic image signal or the sound signal 
through a transmission line from a transmitting side to 
a receiving side where video image or sound is repro- 
duced, in video conference system, video telephone 
system, broadcasting equipment or the like. 



Claims 

1 . A method for encoding a digital signal, comprising 
the steps of: 

receiving a plurality of digital signal bit streams; 

detecting sizes of access units, as an encoding 

unit, of said plurality of digital signal bit streams, 

and decode times thereof; 

comparing the detected sizes of the access 

units at each decode time, with each other to 

select a maximum value thereof; 

providing a virtual stream composed of access 

units each having a size identical to the select- 



ed maximum value at each decode time; and 
packetizing each of said plurality of digital sig- 
nal bit streams, 

said plurality of digital signal bit streams being 
5 packetized by using padding packets each hav- 

ing a size corresponding to a difference in size 
between each access unit of said plurality of 
digital signal bit streams and that of said virtual 
stream, when the size of the access unit of said 
10 plurality of digital signal bit streams is less than 

the size of the access unit of said virtual stream. 

2. A method according to claim 1 wherein said plurality 
of digital signal video streams are video streams, 

is further comprising the steps of: 

receiving at least an audio stream; and 
determining a supply time at which said virtual 
stream as a virtual video stream and said at 

20 least an audio stream are supplied to a decod- 

er, and a size of each packet produced, 
said video streams and said at least an audio 
stream being packetized by using information 
concerning said supply time and said size of 

25 each packet produced. 

3. An apparatus for encoding a digital signal, compris- 
ing: 

30 a receiving terminal for receiving a plurality of 

digital signal bit streams; 
an access unit detecting device for detecting 
sizes of access units, as an encoding unit, of 
said plurality of digital signal bit streams, and 

35 decode times thereof; 

a maximum value detecting device for compar- 
ing the detected sizes of the access units at 
each decode time, with each other to select a 
maximum value thereof: 

40 a scheduler for providing a virtual stream com- 

posed of access units each having a size iden- 
tical to the selected maximum value at each de- 
code time; and 

a packetizing device for packetizing each of 
45 said plur ality of digital signal bit streams, 

said plurality of digital signal bit streams being 
packetized by using padding packets each hav- 
ing a size corresponding to a difference in size 
between each access unit of said plurality of 
so digital signal bit streams and that of said virtual 

stream, when the size of the access unit of said 
plurality of digital signal bit streams is less than 
the size of the access unit of said virtual stream. 

55 4. An apparatus according to claim 3 wherein said plu- 
rality of digital signal video streams are video 
streams, further comprising a receiving terminal for 
receiving at least an audio stream, 
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said scheduler determining a supply time at 
which said virtual stream as a virtual video stream 
and said at least an audio stream are supplied to a 
decoder, and a size of each packet produced, and 
said video streams and said at least an audio 5 
stream being packetized by using information con- 
cerning said supply time and said size of each pack- 
et produced. 

5. A method for transmitting a digital signal, compris- to 
ing the steps of: 

receiving a plurality of digital signal bit streams; 
detecting sizes of access units, as an encoding 
unit, of said plurality of digital signal bit streams, is 
and decode times thereof; 
comparing the detected sizes of the access 
units at each decode time, with each other to 
select a maximum value thereof; 
providing a virtual stream composed of access 20 
units each having a size identical to the select- 
ed maximum value at each decode time; and 
packetizing each of said plurality of digital sig- 
nal bit streams, 

said plurality of digital signal bit streams being 2s 
packetized by using padding packets each hav- 
ing a size corresponding to a difference in size 
between each access unit of said plurality of 
digital signal bit streams and that of said virtual 
stream, when the size of the access unit of said 30 
plurality of digital signal bit streams is less than 
the size of the access unit of said virtual stream, 
thereby transmitting one stream produced by 
packetizing said plurality of digital signal bit 
streams. 35 



comparing the detected sizes of the access 
units at each decode time, with each other to 
select a maximum value thereof; 
providing a virtual stream composed of access 
units each having a size identical to the select- 
ed maximum value at each decode time; and 
packetizing each of said plurality of digital sig- 
nal bit streams, 

said plurality of digital signal bit streams being 
packetized by using padding packets each hav- 
ing a size corresponding to a difference in size 
between each access unit of said plurality of 
digital signal bit streams and that of said virtual 
stream, when the size of the access unit of said 
plurality of digital signal bit streams is less than 
the size of th e access unit of said virtual stream. 

8. A recording medium according to claim 7 wherein 
said plurality of digital signal video streams are vid- 
eo streams, said method for producing the record- 
ing medium further comprising the steps of: 

receiving at least an audio stream; and 
determining a supply time at which said virtual 
stream as a virtual video stream and said at 
least an audio stream are supplied to a decod- 
er, and a size of each packet produced, 
said video streams and said at least an audio 
stream being packetized by using information 
concerning said supply time and said size of 
each packet produced. 



6. A method according to claim 5 wherein said plurality 
of digital signal video streams are video streams, 
further comprising the steps of: 

40 

receiving at least an audio stream; and 
determining a supply time at which said virtual 
stream as a virtual video stream and said at 
least an audio stream are supplied to a decod- 
er, and a size of each packet produced, * s 
said video streams and said at least an audio 
stream being packetized by using information 
concerning said supply time and said size of 
each packet produced. 

so 

7. A recording medium on which a recording signal is 
stored, said recording medium being produced by 
a method comprising the steps of: 

receiving a plurality of digital signal bit streams; 55 
detecting sizes of access units, as an encoding 
unit, of said plurality of digital signal bit streams, 
and decode times thereof; 
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